Analyzing and visualizing the semantic coverage of Wikipedia and its authors
نویسندگان
چکیده
This paper presents a novel analysis and visualization of English Wikipedia data. Our specific interest is the analysis of basic statistics, the identification of the semantic structure and age of the categories in this free online encyclopedia, and the content coverage of its highly productive authors. The paper starts with an introduction of Wikipedia and a review of related work. We then introduce a suite of measures and approaches to analyze and map the semantic structure of Wikipedia. The results show that cooccurrences of categories within individual articles have a power-law distribution, and when mapped reveal the nicely clustered semantic structure of Wikipedia. The results also reveal the content coverage of the article’s authors, although the roles these authors play are as varied as the authors themselves. We conclude with a discussion of major results and planned future work. Summary of results for the nonspecialist: Wikipedia is a free ‘encyclopedia of everything’ that was started by Jimmy Wales on January 15, 2001. Less than five years after its creation it comprises over 2,700,000 articles written by about 90,000 different contributors in 195 languages. This paper provides basic statistics, analyzes and maps the semantic structure of the English Wikipedia as well as the activity of its major authors. Holloway, Todd, Božicevic, Miran and Börner, Katy. (2007) Analyzing and Visualizing the Semantic Coverage of Wikipedia and Its Authors. Complexity, Special issue on Understanding Complex Systems. Vol. 12(3), pp. 30-40. Also available as cs.IR/0512085.
منابع مشابه
Participation and Scientific Collaboration in Persian Wikipedia
Background and Aim: This research studies the effective participation and scientific collaboration in Persian Wikipedia, from 2003-2012. Method: The library method has been used. Also, considering the objectives and the nature of subject, the research method is a descriptive-applied and during its implementation scientometric technique has been used. Excel and SPSS softwares have been used for...
متن کاملAdvertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles
When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...
متن کاملOn the Problem of Lexical Semantic Change
The article provides an insight into a problem of lexical semantic change. A short historical outline of the development of semantic studies is given. The authors analyze some of the most important stages in the history of the formation of this field. The existing approaches to dealing with form and meaning, namely semasiological and onomasiological ones are discussed. The authors consider the ...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملExtracting Semantics Relationships between Wikipedia Categories
The Wikipedia is the largest online collaborative knowledge sharing system, a free encyclopedia. Built upon traditional wiki architectures, its search capabilities are limited to title and full-text search. We suggest that semantic information can be extracted from Wikipedia by analyzing the links between categories. The results can be used for building a semantic schema for Wikipedia which cou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Complexity
دوره 12 شماره
صفحات -
تاریخ انتشار 2007